Skip to main content

Full text of "DESIGN OF AN ITERATIVE PATTERN RECOGNITION PROCESSOR"

See other formats


"L I B HAHY 

OF THE 

U N IVLR.SITY 

Of ILLINOIS 

510,84 

I£6r 

no. 156-163 

c o p * 2 



Digitized by the Internet Archive 
in 2013 



http://archive.org/details/designofiterativ161rays 



84- 

&r DIGITAL COMPUTER LABORATORY 

,. Z, UNIVERSITY OF ILLINOIS 

URBANA, ILLINOIS 



REPORT NO. 161 



DESIGN OF AN ITERATIVE PATTERN RECOGNITION PROCESSOR -3 



by 

S. R. Ray' and K. C. Smith' 



April 22, 1964 



This work is supported in part by 
Contract No. AT(ll-l)-10l8 of the Atomic Energy Commission 






ACKNOWLEDGMENTS 

We wish to acknowledge an indebtedness to the originator and instigator 
of the ILLIAC III project, Professor B. H. McCormick, and to Professor 
R. Narasimhan under whose guidance the study of iterative processing methods 
have proceeded. 

The aid of Messrs. Dennis Hall, Donald Dodson, Vincent Lum, and 
William Mayberry is also gratefully acknowledged. 



SUMMARY 

The logical design of the iterative processor of the ILLIAC III 
computer is described from an engineering viewpoint. The Iterative Array, a 
1024-bit modular parallel processor, and the Transfer Memory, a 48 X 1024-bit 
orthogonal read/write memory, are treated. Illustrative examples are presented 
which relate various elements of the hardware to its uses. 

This paper extends and deepens the system description of the Pattern 
Articulation Unit, discussed by McCormick. 



-1- 



SUMMARY 

The logical design of the iterative processor of the ILLIAC III 
computer is described from an engineering viewpoint. The Iterative Array, a 
1024-bit modular parallel processor, and the Transfer Memory, a 48 X 1024-bit 
orthogonal read/write memory, are treated. Illustrative examples are presented 
which relate various elements of the hardware to its uses. 

This paper extends and deepens the system description of the Pattern 
Articulation Unit, discussed by McCormick. 



-1- 



1. INTRODUCTION 

This paper discusses the engineering design of the iterative portion 
of a visual pattern recognition computer: ILLIAC III. General features and 
raison d'etre of the ILLIAC III have been treated previously [1]. 

It is convenient to view the ILLIAC III system as comprising two 
semi-autonomous processors: (l) a contemporary processor for arithmetic 
operations; and (2) an iterative processor for nonarithmetic, planar (i.e., two- 
dimensional) word operations. These two processors share a small matrix memory 
called the Transfer Memory. The arithmetic processor communicates to this 
memory in kQ bit words while the planar processor or "iterative Array" communi- 
cates to the orthogonal memory axis in 102^ hit (32 X 32) words. The Iterative 
Array-Transfer Memory, together with their control unit, constitute an iterative 
computer. 

The Iterative Array plus Transfer Memory may equally well he described 
as a content-addressable memory as seen from the arithmetic computer. Indeed, 
the concepts of the iterative processor [2-^] and of the content addressed 
memory [5 - 9].> originally considered as disparate notions, converge in practice 
to synonymity. It will be convenient to employ the terminology developed in 
content-addressed memory work to describe engineering features and illustrate 
basic uses of the iterative processor. 

The equipment which is considered here has been designed for visual 
pattern processing—specifically, the automatic scanning, measuring, and data 
reduction of bubble -chamber photographs. The engineering design incorporates 
the results of these simulation studies, including the concomitant development 
of nonarithmetic reduction methods for an iterative organization that has proven 
to be a practical, economical, balanced design [10]. 



-2- 



A major effect of this orientation is seen in the Iterative Array, 
which provides only very fundamental logical operations in exchange for a rich 
and flexibly interconnected environment. Similarly, reflecting the relatively 
high ratio of Iterative Array to Transfer Memory operations, the inclusion of 
nondestructive read facilities in the latter could not be economically justified. 

The system organization will be briefly described in Sec. 2, the 
Transfer Memory discussed in Sec. 3° The Iterative Array is then considered in 
Sec. k. The last section, Sec. 5> is devoted to examples of the use of various 
parts of the equipment. 



2. THE ILLIAC III SYSTEM 

A highly simplified diagram of the ILLIAC III system is given in 
Fig. 1. Short or "Z" words (48 bits) may be transferred in either direction 
between the Arithmetic Computer and Transfer Memory (TM). Long or planar words 
(1024 bits) are transferred between the TM and Iterative Array (IA) by way of 
the M register. Eight other long registers, S1-S8, exist in the IA. Each of the 
1024 modular sections of the IA, containing one bit of each of the nine fast 
registers, is identical and all are identically controlled. The modular sections 
are arranged in a two-dimensional square array of size 32 X 32 with a border of 
overflow bits. 

In content-addressed memory terminology, the M register is comparable 
to the interrogation response register. The encoding and marking operations 
required to retrieve the results of nonunique interrogation responses are also 
required in the present equipment but for a wider range of uses. A broader 
discussion of the system is available [1]. 



-4- 



PATTERN ARTICULATION UNIT 



TRANSFER MEMORY 



ARITHMETIC 
COMPUTER 



MAIN 
CONTROL 



(SHORT OR 
Z WORDS) 



MAIN CORE 
MEMORIES 



48 | 

BITS 



i li ii 



PAU 
CONTROL 



TTT 

1024 BITS 



i II 



-Z WORD 



-PLANE WORD 



(LONG OR PLANAR WORDS) 



M REGISTER 



S8 



Si 



ITERATIVE ARRAY 

(2-d LOGICAL 
PROCESSOR) 



PHOTOGRAPHIC 
SCANNERS 



(I/O) 



ILLIAC III SYSTEM 
FIG. 1 



3o THE TRANSFER MEMORY 

The Transfer Memory (Fig. 2) is a random access, linear select core 
memory of 48 X 1024 bits. Considered as a two-dimensional matrix, the memory 
is arranged for read or write operations either by rows or by columns. An 
encoder is also included in the Transfer Memory. 

The 1024 bit words, lying in the plane orthogonal to the Z axis, are 
called "plane" or "long" words while the 48 bit words are "z" or "short" words 
(aligned in the Z direction). 

Access to both word types is performed in a 2 usee destructive mode 
cycle. A one-core-per-bit, linear selection organization is used. The cores 
are 29 X 19 X 6.5 mil ferrite switched to 80 per cent or more of the available 
flux. Read drives of 350 ma-t (corresponding to 200 nsec switching time) are 
applied by sink-source switches to one-turn windings. The total write mmf is 
240 ma-t developed by one-turn windings in the x,y direction carrying 120 ma and 
by two-turn windings in the Z direction which carry 60 ma. 

Access to Z words is straightforward, but the planar word access 
requires some elaboration. 

Plane Word Operations 

Plane words are treated at the circuitry level as eight 128-bit 
segments which are read or written simultaneously. The M register of the 
Iterative Array acts as the data register for plane words. 

There are two a priori requirements which effectively shape the detailed 
organization of this section of the memory. One requirement is that there exist 
the possibility of setting "l" (or "marking") into any specified unique bit of 
the 1024-bit M register, a MARK operation. A full decoding of a 10-bit address 
is needed for this since the address occasionally originates from the control unit. 



-6- 




zy 




=%= 



e— 



JZ 



\T 



b£ 



,,o°' 



U 



&& 



T 






N a 



-e— 1 





i 




§«| 


II 


I 


' 



s ui «5 u 

K ~ a 

I! » » 

w S q 

I t ' 



8 

The second requirement, dictated by the need for storing plane words, 
is simply that there be (at least) one digit driver for each bit of the plane 
words, i.e., 102 4- digit drivers. 

These two requirements plus the generation of write current for Z 
words can be economically mated as described in the sequel. 

Referring to Fig, 3> it can be seen that the Plane Word Digit Driver 
may be activated by the M register (containing the information to be written) 
during the writing phase of a plane-word operation cycle. In this case, the digit 
driver acts in the role which its name implies. 

A Z-word operation cycle may selectively activate one and only one of 
the 102^4- Plane Word Digit Drivers, specified by the 10-bit Z address, whereupon 
the driver produces Z word line write current. Moreover, the "digit driver" out- 
put may also be gated (by the Mark Gate) to mark the M register which is not 
directly engaged, of course, in the principal data manipulations for a Z word 
cycle. Thus, MARK operations may be selectively included in Z-word read or write 
cycles at no cost in time beyond that of performing the Z word operation alone. 

In summary, the costs of the large quantity of Plane Word Digit Drivers 
are amortized over three uses: as plane word digit drivers, as Z word write 
drivers, and as MARK signal generators with only relatively small costs in diodes 
and logic drivers required to select the various modes. 

The MARK operation may also be performed independently of storage 
operations by supplying Short Word Operation Gate and Mark Gate signals after 
having placed the necessary address in the Z register. In this case, the digit 
driver which is activated causes no irreversible core flux change since no other- 
drivers are simultaneously active. 

Advantage is also taken of the shortness of digit lines (WL lines) 
+ 
which link only kQ cores, and allow the use of a two-turn winding, thereby 

halving the digit current and doubling the sense signal amplitude. The length 



I 



PLANE WORD 
OPER. GATE" 



SHORT WORD 
OPER. GATE 




PLANE WORD 
DIGIT DRIVER 



(CONTINUED 
DECODING) 



1 M REGISTER 




PLANE WORD STROBE 



MARK GATE 



PLANE 



"I— 1 

54 BIT Z WORD 



WORDS - 



Z WORD 
READ DRIVER 



PARTIAL DECODER 



Z ADDRESS REGISTER 



MARK OPERATION: (SET i IN 
M REGISTER POSITION 
M z WHERE Z IS THE 

"Z ADDRESS"). PER- 
FORMED BY SUPPLY- 
ING: (1) SHORT WORD 
OPER. GATE AND (2) 
MARK GATE. 



Z ADDRESS 



EQUIPMENT FOR PLANE WORD OPERATIONS AND MARK 
FIG. 3 



10 

also makes it possible to control the digit current simply by series resistance 
and driver saturation. It should also be noted that the cores are operated with 
equal digit and write mmf's (120 ma-t) so that no change is required in the 
current-setting circuits when the driver changes from digit driver to write 
driver status. 

It was not found possible, unfortunately, to reduce greatly the cost 
of the sense amplifiers either directly or by multiple use tactics. 

Encoder 

The M register, which serves dually as the plane word data register 
of the Transfer Memory and as a register of the Iterative Array, delivers inputs 
to the encoder. The encoder consists of combinational circuitry which delivers 
(l) a 10-bit binary number (Z address) equal to the minimum coordinate or bit 
number of the M register which contains "0" and (2) a one-bit signal ("SOME MARK") 
indicating the existence of at least one "0" in the M register. Zeros may be 
interpreted as interrogation responses, that is, those words for which an 
associative criterion is true. 

The encoded address of a "0" in M is developed and sent to the Z 
Address Register of the TM, where it may be used to erase (reset to "l") the 
corresponding zero by means of the MARK operation, thus allowing the encoder to 
proceed to the next zero. 

A similar procedure has been proposed [6, 11] for resolving nonunique 
interrogation responses in associative memory designs. It should be noted, 
however, that the address of a mark ("response") is more frequently important 
here than are the contents of a responding memory location. Thus, it is of 
paramount importance to encode addresses to the maximum entropy form (10 bits, 
here), a premise which has some effect upon the choice of encoder design, and is 
at variance with the usual requirements for content addressed memory. 



11 

Although the encoder design is not a problem of major proportions, 
it is a recurring one. It is of some interest, perhaps, that a design which is 
is at once fast and economical can be described by a simple heuristic . 

The encoding strategy is conveniently described by conceptually 
arranging the encoder inputs, in a two-dimensional matrix. Assume the 10-bit 
encoder output, Z = (y,x) where the ordered pair, y and x, are the most and 
least significant 5 -bit groups of Z, respectively, and the numerical relation 
Z = 32y + x holds. The minimum Z such that 1VL = is desired. If the set of 
values of (y,x) represent the row and column coordinates of the matrix in the 
usual way (see Fig, k) , then the minimum numerical value of Z for which a zero 
occurs in the matrix is found by first selecting the uppermost row containing 
at least one zero and, in that row, the leftmost column containing zero. The 
row and column thus selected clearly establish the minimum value of Z for which 
a zero exists. 

The logical realization of this procedure for a 1024 input encoder is 
shown in Fig, 5- 

The 32-input NOR circuits correspond to the rows and columns of the 
matrix of Fig, h The priority circuit for the Y group (rows) effectively selects 
the uppermost row containing zero, delivering y. = 1 for that row (row i) and 
forcing all other y signals to zero. The selected y. then gates the 32 M signals, 
corresponding to entries in its row, into a similar priority circuit where the 
desired x is formed. Subsequent encoding of the Z address and generation of the 



SOME MARK signal is obvious. 






The priority chain can be segmented for faster operation by sending 
advanced inhibit signals to indicate to lower rows, that some higher priority 
row has already been selected. In the limit of minimum segment length (one y 
group), the encoded address can be attained in 11 circuit operation times with 
fanout from the inhibit drivers being 31 ( = "^n - 1 where n is the number of bits) 



12 



=0 =1 = 2 =3 



= 



= i 



= 2 



=3 



M 


M i 


M 2 


M 3 


M 4 


M 5 


M 6 


M 7 


M 8 


M 9 


M i0 


M li 


M 12 


M 13 


M i4 


M 15 



(a) 



jn - 



= 



= 



= 1 



=2 



=3 



= 2 =3 



1 




i 


i 




1 




1 


1 












1 

6 

t 
i 


1 




i 




i 
i 

i 


i 





(b) 



UPPERMOST, ROW 
CONTAINING (y=i) 



LEFTMOST COLUMN FOR 
WHICH UPPERMOST ROW 
CONTAINS (x = 2) 



Z = (y,x) = (l,2)=0il0 2 =6 i.e.,M 6 



CONCEPTUAL DESCRIPTION OF ENCODER LOGIC 

(a) LOCATION OF M REGISTER BITS. 

(b) EXAMPLE WITH 6 BEING DESIRED ENCODING. 



FIG. 4 



13 



PRIORITY NET 



«o = 



M , 



M 52 # 

M 63 



M 6 4 — 

M 95 



M 96 • 

M I27 



I 



L& 



y,* 



L0 



— — ^5>^y 



y»* 



KO 



-^©-h 



9, !► 



L-® 



^(5^: 



< 4 n 



M 896 • 

M 927 



9 28 



y 28 * 





1 1 / 8 „ A ADVANCE 

S12 Y INHIBITS 



9 2 e! 



-*(?) -y 28 



928 • 



959 



y 29 * 



t — 1 X D-— 



U<?) 



^960 • 



M»9I 



I «».► 



*30* 



'29 



Ki) 



"*©— 



M 992 # 



I— | «3I 





y 3 i' 




-n@ — ^3 



L__^ 



SOME MARK 



ENCODER 
FIG. 5 




PRIORITY 
NET 

IDEN- 
TICAL 
TO Y 
NET 



992 





A ) ••• ( A 



M, M 33 



♦Xe 



ETC. TO X 



31 



x,V 



X, X 



I A 3 



!!••• 1 



♦z 



y* y«7 





y 31 • 



LLiu I • 



-»z. 



» ENCODER 
OUTPUTS 



NOTE: g=i IMPLIES A HIGHER PRIORITY 

' ROW (COLUMN) HAS AC IN M. 



Ik 



In the present design, the chosen segment length is four groups giving 22 

II v/ - ^ 

operations times' for encoding and an inhibit driver fanout of 7 (= —t~ - l). 

This encoder design can be described as two-stage, the two stages 
being the determination of y followed, in time, by x which, of course, is a 
function of y. A one-stage design, such as that given by Rogers and Wolinsky 
[11], has a higher maximum speed of operation but requires larger driver fanouts 
for a given speed „ Also, it is well suited to production of a l-in-1024 indica- 
tion but less desirable for full encoding. 

The components required to fabricate the encoder comprise about 800 
transistors and 4,000 diodes for 1024 inputs, 

A plurality indicator (true if two or more zeros in M) is not included 
in the present design but is trivial to generate. 

Transfer Memory Operations 

The micro operations available in the Transfer Memory are given in 
Table 1„ There are three species of operations: (l) Storage Access, (2) Mark 
Operation Only, and (3) Storage Access and Mark Operation. The word type, 
direction of information flow, and origin of addresses are optional as indicated. 
Meaningless relations are indicated by XX; "Control" means that the address is 
supplied from the control unit in the usual way. 

The data transfer for plane words is always to and from the M register. 
Plane word address data is always obtained from the control unit. The encoded 
output address and SOME MARK signal are available as a special output data byte 
for use, typically, in address list compilation. 

Hardware 

The components required for the total Transfer Memory include 12,000 
transistors and 30,000 diodes; about 60 per cent of the components occur in the 



15 



TABLE 1. TRANSFER MEMORY OPERATIONS 



Operation 
Species 


Word 
Type 


Information 

Flow 


Z Address Origin 


Storage 
Access 


Planar 


Read/Write 


XX 


Short 


Read/Write 


Control/Encoder 


Mark Only 


XX 


XX 


Control/Encoder 


Storage Access 
and Mark 


Short 


Read/Write 


Control/Encoder 






16 

plane word digit drivers, sense amplifiers and closely associated circuitry. 
Over 2,000 transistors of the total are involved in communication (cable driving) 
with the IA. 

Packaging is on 620 etched wiring cards of V X 6" dimensions in a 
17" X 25" X 65" cabinet . Maximum dc one -cycle -average power consumption is two 
kilowatts. 



h. THE ITERATIVE ARRAY 

The Iterative Array (IA) is a 102U-module parallel processor comprising 
nine planes of horizontal registers (see Fig. 6) interconnected to provide logical 
functions of vertically- related cells (cells having identical x,y coordinates), 
of nearest neighbors in the x and y directions, and various combinations of 
vertical and nearest neighbor cells. 

The capability of the circuitry at each x,y position of the Iterative 
Array is most clearly understood in terms of the logic diagram of Fig. 7. The 
state of any gate must be identical at all x, y positions. 

Simple Logic Functions 

The gates controlling the OUTBUS may be operated such that any one of 
the following functions of the ten indicated variables is formed on the 
OUTBUS : 



f = M • SI • S2 • ... ' S8 • INBUS 

f = f 
2 1 



f = M ^ SI . v S2 v . o „ v S 8 v INBUS 

Any subset of the ten variables may be included in the arguments; the 
null set produces f, = f. = 1„ 

The OUTBUS function formed is made available to the eight neighboring 
modules as well as the originating module. Conversely, by symmetry, the OUTBUSes 
(l to 8) of the nearest neighbor modules are available as inputs as well as the 
self OUTBUS (0) and the self M bit. 

Either of the following functions may be formed on the INBUS and gated 
into one or more of the registers: 



-17- 



18 



i 

PLANE WORD 
NUMBER 

/ 

z 

OVERFLOW / / 

BORDER \^1 '_ 

/- / A 

/ / 




/ 



/ 



M 

S8 
S7 
S6 
S5 
S4 
S3 
S2 
SI 



JL 



X 



TRANSFER 
MEMORY 



ITERATIVE 
ARRAY 



TRANSFER MEMORY- ITERATIVE ARRAY 
INTERCONNECTION 



FIG. 6 



19 




u. 
u. 
o 



u. 

U- 

o 

Ul 

< 

o 

a: 
o 

U. 



O — 



o 

z 
< 



UJ 
00 



-K) 



< 


UJ 
0. 

>- 


> 
< 


UJ 

> 


O 


H 


- 


o 






u 


<• 


r 


u. 






o 


o 


UJ 


V 


UJ 


z 


_l 




1- 


o 


CO 


v-4 


<r 




_) 




o 


o 


<-> 






z 


UJ 




ul 


< 


^ 


?\ 


z 
o 


~-Q 


1 


■< 


>- 


- 


13 




_l 


Ul 


? 




z 


z 






o 


o 


u; 



UJ 

I- 
o 



5 

G£ 

<c 

Ul 

> 
or 

UJ 

H 



U. 

o 

Ul 

-J 

O 

o 

UJ 



o 
u. 

u 

Q 



u. 



UJ *tf 

£T 3U. X 2 

£ COO C£g 

o 2 



20 



8 
5 = M v ( V" 0UTBUS i ) 



i=0 



For no arguments, g = 0. A constraint exists on the register(s) 
into which the INBUS may he gated, namely, that the receiving register may riot 
be a component of the argument of the function on the INBUS. 

The formation and return of the functional value to a module 
constitutes a single micro-operation with a timing of 0.5 usee including gate 
setup „ 

Stacking 

The stacking operation provides an analog or base 1 counting ope ation. 
The subspecies, "STACK l'S UP," may be defined by the replacement formula 

Si' = INBUS • Si . S(i - l) 

where i = 1, 2, „,,„, 8, SO = 1 (effectively), Si' means the new value of Si and 
the replacement is made only if Si' is "l". In words, the value of the INBUS 
replaces the lowest lying zero (if any) in SI to S8 with the register's physical 
relation as indicated in Fig* 6„ 

An example of the use of STACK is to provide a general threshold 
function of the number of 1 ' s and 0's in the vicinity of each coordinate position. 
That is, by stacking, the function'' 



Z (a. • V.) + T 



21 



may be evaluated if a, V, and T are integers, the variables, V., are all at, 

J 

or adjacent to, the x,y position under consideration, and no intermediate sum 

falls outside the range 0-7 . 

Another use of STACK is in addition-subtraction (Sec. 5)- 

Two variants of the STACK operation are permitted: (l) STACK l'S UP 

and (2) STACK O'S DOWN in which the highest 1 is replaced by the complement of 

the INBUS. 

Commonly Used Microinstructions 

Some simple microinstructions having general utility are introduced 
here and used in Sec 5- 

(1) Clear IA to O's (l's). 

This instruction sets all IA registers to (or l). 
By activating no gates to the input OR circuit (see 
Fig. 7), g-, = is produced on the INBUS . The INBUS- 
to-register gates may be activated to clear any 
register (s) to zero; in particular, all nine registers 
may be cleared,, Use of g = g, obviously provides for 
clearing to l's. 

(2) Replicate M 

This instruction causes the contents of M to be replicated 

in the eight storage elements of the same cell. 

Form g = M on the INBUS and gate INBUS to SI, S2, ..., S8. 

Hardware 

Each of the 102U identical modules of the Iterative Array depicted 
in Fig. 8 contains 39 transistors and 17^- diodes used in circuits of mixed type 



22 



TX7T £? 



PUL&t C/P/DOWW 



i ** tetter 




Y 



G*t7£ / setter 



FIGURE 80 ITERATIVE ARRAY MODULE 



23 

Saturated circuitry has been used where load variations are large, notably in 
the flipflops and output circuits. Current switching logic has been used for 
its level shifting capability and passive input properties in both complementing 
circuits. 

The IA design has been based on three major tenets. The first and 
most important of these has been to establish a fundamental logic in which many 
basic operations are simply and cheaply available. In the resulting design a 
total of 33 independent control inputs to each IA module provide in excess of 
one million useful gating combinations in each. 

A second premise has been to reduce the total power supply drain. To 
this end resistor gating with low standby power has been used in most cases. 
Also it has been arranged that the high-current stacking logic circuits be 
selected prior to use and idle at very low power levels at other times. 

The third requirement has been to distribute the load on the driver 
system as uniformly as possible. By this means one can avoid the extreme of an 
overly simple iterative module which, by its high driver requirements, unduly 
complicates the fanout of control to the entire array. Accordingly power gain 
has been introduced into the module so that a uniform fanout of 32 from all 
drivers may be used. Since the design has been intentionally well integrated it 
is only occasionally obvious where this has occurred. 



5. MEMORY CONTENT SEARCH OPERATIONS 

The succeeding exposition will illustrate the logical structure of 
the IA and TM in the role of memory content addressing and other macroinstructions. 
Several examples have been chosen to illustrate the nature of the calculus of 
various functions of memory content. 

These functions are categorized by noting that examination of the 
memory contents, C, consists of evaluating a prescribed search function 
f(c) = {f }, where z = 0, 1, ..., 1023 and f is the interrogation response for 
memory word Z. If f = f(C ), that is, if f depends only upon the contents of 
memory location Z, then we say that f is a "vertical" function; if f = f(C ^) 
with z* f z, then f is said to be a "horizontal" function; otherwise, if 
f = f(C , C ), the search is "mixed." By extension, if a majority of arguments 
come from words other than word Z, the function is said to be "strongly horizontal. 

In most nongeometrically interpreted searches, the vertical function is 
of primary import whereas all three types have applications in the present 
case. 

The following description will be couched in microinstruction terms, 
rather than a higher level language, in order to illustrate, most directly, the 
capabilities of the equipment. It should be understood that a typical user will 
be provided with a high level language which maps the true structure into a 
simplified one. 

Simple Search 

The object of a simple search is to identify words having a precise 
state in a specified group of bits or, equivalently, to evaluate a single minterm. 
This is a purely vertical function. After transferring the required variable 
planes from memory to the IA a function of up to seven variables (i.e., a 7-bit 
tag) can be processed in two microinstructions (at most). 

-2k- 



25 

The search function may be expressed as a minterm, grouped, and the 
complement function formed, A simple illustration will suffice. Assume the 
function to be evaluated is x.x x x,x = g (or tag 11001 ) and that x. is in 
register Si. Then performing the following two operations in the IA 

(SI • S2 • S5) -> S6 
S6 ^ S3 v Sh - M (= g) 

leaves g in M. The zeros in M thus correspond to Z words having the desired tag 
which may be read out in their entirety or address listed. 

Therefore, for a tag of n variables, n < 1 , and k responses to inter- 
rogation, the operations are: 

(1) n memory accesses for variables 

(2) two (or one) IA micro- instructions 
and 

(3) k memory accesses (Storage Access and Mark) or address 
listing operations (using MARK only) , 

For any n, the IA operations can be done in 3 " ([n/7]) + 2 microinstructions, 
at most, where [x] is the greatest integer less than x„ 

For long tags, simple searches are strongly memory-access-time limited 
as one would expect in a serial-by-bit readout. Actually, however, not only 
are tags normally very short (one to three bits for the principal cases) but the 
percentage of IA time used for purposes other than memory content search is 
expected to be high. On the other hand, "full associativity," the ability to 
use any bit (plane word) in the search function argument, is essential. 

The location of photographic transparency (brightness) contours, which 
may be encoded as a 3 -hit gray scale function of film coordinates, is an example 
of the use of a simple search on photographic data. 



26 



Minimum or Maximum Search 



A search for the numerical minimum (or its dual, the maximum) , a 
vertical function, is readily performed. Consider a search for the minimum 
three-hit tag. Using the Stack operation a number of times equal to the weight 
of the tag digit under consideration, the procedure for tag hits t~, t, , and t , 
with weights 1, 2, and h respectively, is shown in the flow diagram and example 
of Fig, 9. 

The lowest-lying S plane containing at least one zero is found by 
copying planes S1-S8 into the M register consecutively (or using a binary chop) 
and examining the "SOME MARK" signal from the encoder. The marked words may 
then be read out or address listed. 

For tags longer than three bits, the preceding operation is performed 
for the most significant three bits, the result placed in M and replicated in all 
S registers (one microinstruction) thus writing ones in all columns which failed 
as possible candidates for minimum. Repetition of the stack and search operation 
for the remaining tag bits in groups of three or less will then eventually 
produce a minimum indication. 

Use of STACK l'S UP from M rather than M may be used to locate the 
maximum tag(s). 

Approximately kn IA micro- ope rat ions and n memory accesses are required 
for minimum or maximum search on n bit tags; thus the speed is k-n microseconds 
plus the time for address listing or reading of responding words. 

"Between limits" searches are also readily performed. 

Addition and Subtraction 

Addition in the vertical direction may be simultaneously performed on 
102U operand pairs stored in memory. The sum is produced serially by bit using 



START 



27 



CLEAR I. A. 



READ 



PREFORM 
ONE 
TIME 



STACK i'S 
UP IF M= I 



READ 



PREFORM 

2 

TIMES 



^ 



STACK I'S 
UP IF M=l 



KJ 



READ 
t 2 — *M 



PREFORM 

4 

TIMES 




STACK i'S 
UP IF M=l 



KJ 



FIND LOWEST LYING 

PLANE CONTAINING 

SOME MARK. 



M 

S8 

S7 

S6 

S5 

S4 

S3 

S2 

Si 



-LOWEST LYING PLANE 
CONTAINING AT LEAST 
ONE MARK (ZERO). 






1 





1 








1 





1 


i 


i 


i 


i 








1 





i 
























































































i 













1 


i 










i 


i 


1 


i 




1 


i 


i 


i 


i 




i 


i 



MEMORY PLANES 
>• SELECTED AS TAG 

DIGITS. 



LA. (AT COMPLETION 
OF STACKING) 



MINIMUM SEARCH 
FIG. 9 



28 

the following procedure which is substantially simpler than a usual half-addition 
sequence. 

Let the ith summand digits be a. and b. and let c. be the carry into 
the ith stage of addition . If the carry is retained in plane SI and the sum 
digit, s., returned to storage, the sequence of operations indicated in Fig. 10 
will produce the desired result. 

This addition algorithm requires about seven IA steps and three 
memory accesses per sum digit. 

Subtraction can be implemented by a similar procedure. Addition or 
subtraction using an operand in a geometrically-adjacent neighbor word differs 
from the foregoing only in the detailed choice of gates. This fact holds 
generally for all preceding operations due to the similarity of the self and 
neighbor gates into the IA module. 

Graphic Search 

In the course of pattern manipulations, a digitized analog graph 
representing connectivity relations may be developed (see Fig. 11, plane Si). 
The nodes of such a graph are determined by local processing [1] and placed in 
another plane as, for example, in plane S2 of the same figure. 

It is desired to convert the connection and node location information 
into an address list, or perhaps, to retrieve information from memory in 
graphically determined order. 

The procedure consists of setting a point in the initiation plane (M) 
to 1, allowing the signal to propagate outward along the connection graph 
plane lines until the propagating signal reaches a point specified as a node by 
plane S2. Such points are then established as zeros in the M plane and listed. 
Figure 12 shows the required logical structure of the IA modules which is 



29 



OPERATION SEQUENCE : 

CLEAR I. A. TO 
— Qj — »M 

STACK i'S UP IF M = i 



i 



M 



STACK i'S UP IF M = i 

S2 — -S4 

S1-S4 *S5 

S3vS5 -M 

M ►STORAGE (Sj) 

r S2vS3 *Si (C j+1 ) 

*S2,S3, ,S8 

■ADVANCE i 



TM 



'LA. SHOWN AT THIS OPERATION STAGE- 



NOTE: 



Sj=S3v(Si-S2) 

Cj=S2vS3 

c Q =0 








1 





i 


Sj TO TM 


7 S8 












(s7 












S6 
I. A. AFTER ONE 












OPERATION SEQUENCE 

S5 





1 










S4 


i 


1 










S3 











1 




S2 








1 


i 




Si 





i 


i 


i 






(J 



r° 


'. 


3 


— c w RE 


CASES: 

a i + bi+Cj = 

ADDITION 
FIG. 10 


1 


2 

1 





30 



'0 
'0 
'0 
'0 00 
'0 
'000000 
'000000000 
r 00000i 



000000000000 



INITIATION POINT (=i) 



POINTS WHICH WOULD 
BE INDICATED AS A 
RESULT OF THIS STAGE 
OF SEARCH (AFTER 
NODE MARK PHASE). 




NODES NOT YET LISTED ( = 0) 



0/ CONNECTION GRAPH-, 
GRAPH LINES ARE 1'S 



EXAMPLE OF GRAPH SEARCH 
FIG.il 



31 



FROM 8 
NEAREST NEIGHBORS 



'OUT i 
OUT 2 

OUT 8 









"INBUS" 
















M 




























S3 









R 


I 












S2 




A 
N 
D 












Si 










II 































OUTBUS 



PROPAGATION PHASE 



V OUTifvM — ►INBUS — -S3 
i =1 

INBUS SI- S2 — ► OUTBUS 



NODE MARK PHASE 

S2 — >S 4 

(S3-S4)— M 

(DESIRED NODES ARE O'S IN M) 



LOGIC CONNECTION DURING 
GRAPHIC SEARCH (PROPAGATION PHASE ) 

FIG. 12 



32 

established by gate selection . The Flash gate (see Fig. 7) is active during the 
signal propagation mode which may consume about 3 usee maximum. 

This operation is an example of mixed search with a strongly horizontal 
component since the responding address may depend upon information in distant 
words. 

Adaptive Operations 

Various adaptive pattern classification operations can be programmed 
quite directly for the system. The correspondence to the "learning matrix" form 
of conditioned correlation devices is perhaps the most direct [12]. 

Consider, for example, K^> linear word bits Of the TM to correspond to 
15 pattern classes C. with three bits representing the conductance to each of 
the 102^ orthogonal lines which correspond to feature sets x. . 

In the learning phase, the feature set may be loaded into the M 
register and temporarily stored in one word of the TM. 

Using the three bits corresponding to the prescribed pattern class to 
transfer relevant planes to the IA, the weighting(s) of the connections ) 
described by the feature class may be simultaneously altered by addition or 
subtraction. On the other hand, if desired, the feature class input may be used 
to address the pertinent Z word(s) which may be transferred to the arithmetic 
computer for alteration,, 

During the knowing phase, a presented feature set, for example, 

containing relatively few marks, x., may be used to read the conditional prob- 

J 

ability vectors from memory into the arithmetic computer. That is, the Pr(C.|x.) 
vectors (Z words) for the active values of j in the feature set, may be con- 
veniently transferred to the arithmetic calculator where the desired pattern 
class, C, may be calculated according to 



33 



C. = max {Pr(C. |x. )} 
1 ± i 1 j /J 



Upon specifying a pattern class, the corresponding feature set may 
be determined by a maximum search in the IA„ 

Although the use of this system in adaptive operations has not been 
investigated deeply, it is clear that the flexibility provided by logical 
arithmetic computations on orthogonal axes is much greater than for a one-axis 
computer,, The high loading rate of 2 nsec/bit (96 (asec/core load) for the 
Transfer Memory is well adapted to searches requiring frequent core reloading, 
hence, the simulation of sizable matrices can reasonably be contemplated. 



6. CONCLUDING REMARKS 

The equipment which has been described provides an example of an 
iterative processor of practical design oriented toward pattern recognition 
problems. In particular, methods for the visual data processing of bubble- 
chamber negatives have been established for this processor „ 

The economic attainment of the necessary processing generality can be 
seen to rest largely upon the richly interconnected but programmable nature of 
the Iterative Array modules . Such arrangements are no longer impractical to 
program because of recent advances in the knowledge and development of high-level 
languages . 

Stacking, by providing an easily interpreted base one or analog count, 
is of general use in threshold logic operations . The increased facility for 
decision making which it provides has been illustrated in the implementation 
of an addition algorithm. 

The Flash procedure provides a rapid means of path-following which was 
illustrated for the case of conversion of a digitized analog graph to an address 
list or digital graph. It is also useful in generating mask words, checker- 
boards, and various regular patterns. 

The Transfer Memory, acting in its primary role as a storage unit to 
the IA, necessarily includes facilities for reading/writing of 1024 bit words. 
In addition the realization of the MARK operation, and the reading/writing of 
short words (for communication with the associated arithmetic computer of more 
conventional design) were shown to be conveniently integrated to economic 
advantage . 

In addition to its basic purpose, it is reasonable to expect that this 
processor will also be used as an experimental research tool for investigation 
of the application of iterative methods to other problems. This new facility 



-34- 



35 

is especially significant since a computer organized along conventional lines 
is decidedly ill-adapted to simulation of parallel iterative processing. In 
particular it has been noted that the organization of the processor has a 
number of advantages for studying on a large scale the application of adaptive 
correlation techniques to various problems. 



FOOTNOTES 

Received . This work is supported in part 

by Contract No. AT(ll-l)-10l8 of the Atomic Energy Commission. 

'Digital Computer Laboratory, University of Illinois, Urbana, Illinois. 

+ 
Fifty-four cores actually exist in the matrix, only kQ of which are typically 

used for information storage „ 

W depends upon technology and basic circuit repertoire, of course; the 

numbers quoted are for typical contemporary diode-transistor logic. 

'A mean time of 330 nsec for the present design „ 

'This function evaluation is a macroinstruction. 

If ones occur in a three-bit group of all candidate words, then S8 would be the 
valid plane, containing the useful accumulated marks. 

4n micro-operations of 0^5 microseconds plus n memory accesses of two 
microseconds. 



■36- 



BIBLIOGRAPHY 

[1] B. H. McCormick, "The Illinois Pattern Recognition Computer- -ILL I AC III," 

IEEE Trans, on Elec. Computers, vol. EC-12, pp. 791-813; Dec, 1963. 
[2] So H. Unger, "A Computer Oriented Toward Spatial Problems," Proc. IRE, 

vol. KG, pp. 17^-^-1750; Dec, 1958* and "Pattern Detection and Recognition," 

Proc IRE, vol. kl, pp. 1737-1 7 52; Dec, 1959» 
[3] J. Ho Holland, "iterative Circuit Computers," Proc of WJCC, pp, 259-265; 

i960. 
[k] J, Gregory and R, McReynolds, "The SOLOMON Computer," IEEE Trans, on 

Elec Computers, vol. EC-12, pp. 77^-781; Dec, 1963. 
[5] A. E. Slade and H. 0. McMahon, "A Cryotron Catalog Memory System," Proc. 

of EJCC; Dec, 1956. 
[6] J. R. Kiseda, H. E. Petersen, W. Co Seelbach, and M. Teig, "A Magnetic 

Associative Memory," IBM JRD, 5, No„ 2, pp. 106-121; April, 1961. 
[7] R° R° Seeher and A B. Lindquist, "Associative Memory with Ordered 

Retrieval," IBM JRD, 6, No.. 1, pp. I26-I36; Jan., I9620 
[8] P. M. Davies, "A Superconductive Associative Memory," AFIPS, Proc I962 

Spring JCC, San Francisco, Cal. (May 1-3, I962), 79-88. 
[9] R. Ho Fuller, "Content-Addressable Memory Systems," Dept. of Engineering, 

UCLA, Report No 63-25; June, 1963. 
[10] R. Narasimhan, "Syntactic Descriptions of Pictures and Gestalt Phenomena 

of Visual Perception," Digital Computer Lab», Univ. of Illinois, Report 

No. llj-2; July, 1963. 
[11] J. Lo Rogers and A„ Wolinsky, "Associative Memory Algorithms and their 

Cryogenic Implementation," Report No. 8670-6007-RU]000, Thompson-Ramo 

Wooldridge Space Technology Laboratories; Dec, 1963. 
[12] H. Kazmierczak and K. Steinbuch, "Adaptive Systems in Pattern Recognition," 

IEEE Trans, on Elec Computers, vol. EC-12, pp 822-835; Dec, 1963. 

-37- 



V 



M 2 0tm